Deriving a Lexicon for a Precision Grammar from Language Documentation Resources: A Case Study of Chintang
نویسندگان
چکیده
Language documentation projects typically invest a lot of effort in creating digitized lexical resources, which are used in the creation of dictionaries and in the glossing of collected texts. We present and evaluate a methodology for repurposing such a lexical resource developed for Chintang (ISO639-3: ctn), a language of Nepal, for use with a precision implemented grammar developed in the DELPH-IN formalism. The target lexicon, when combined with a set of morphological rules, achieves 57% type-level coverage and 50% token-level coverage of held-out texts, while maintaining a feature-level accuracy F-measure of 70%. As lexicon development is typically one of the most expensive aspects of creating a precision grammar, this represents a significant savings of effort. TITLE AND ABSTRACT IN GERMAN Ableitung des Lexikons für eine Präzisionsgrammatik aus dokumentationslinguistischen Ressourcen anhand einer Fallstudie zum Chintang Typische Sprachdokumentationsprojekte investieren viel Zeit in den Aufbau digitaler lexikalischer Ressourcen, die für die Erstellung von Wörterbüchern und für die Glossierung von Korpustexten genutzt werden können. Dieser Vortrag stellt eine alternative Verwendung eines elektronischen Wörterbuchs vor, das für das Chintang (ISO639-3:ctn), eine bedrohte Sprache Nepals, entwickelt wurde. Die Kombination dieses Wörterbuchs mit einer nach dem DELPH-IN-Formalismus entwickelten Präzisionsgrammatik in Form morphologischer Regeln kann erste Texte auf der Type-Ebene zu 57% und auf der Token-Ebene zu 50% abdecken, wobei auf der Merkmalsebene ein F-Maß von 70% gewahrt wird. Da der Aufbau lexikalischer Ressourcen zu den zeitintensivsten Komponenten der Entwicklung einer Präzisionsgrammatik gehört, bringt diese Methode eine signifikante Zeitersparnis mit sich.
منابع مشابه
Towards Creating Precision Grammars from Interlinear Glossed Text: Inferring Large-Scale Typological Properties
We propose to bring together two kinds of linguistic resources—interlinear glossed text (IGT) and a language-independent precision grammar resource—to automatically create precision grammars in the context of language documentation. This paper takes the first steps in that direction by extracting major-constituent word order and case system properties from IGT for a diverse sample of languages.
متن کاملLearning Grammar Specifications from IGT: A Case Study of Chintang
We present a case study of the methodology of using information extracted from interlinear glossed text (IGT) to create of actual working HPSG grammar fragments using the Grammar Matrix focusing on one language: Chintang. Though the results are barely measurable in terms of coverage over running text, they nonetheless provide a proof of concept. Our experience report reflects on the ways in whi...
متن کاملA Supervised Method for Constructing Sentiment Lexicon in Persian Language
Due to the increasing growth of digital content on the internet and social media, sentiment analysis problem is one of the emerging fields. This problem deals with information extraction and knowledge discovery from textual data using natural language processing has attracted the attention of many researchers. Construction of sentiment lexicon as a valuable language resource is a one of the imp...
متن کاملBootstrapping Deep Lexical Resources: Resources for Courses
We propose a range of deep lexical acquisition methods which make use of morphological, syntactic and ontological language resources to model word similarity and bootstrap from a seed lexicon. The different methods are deployed in learning lexical items for a precision grammar, and shown to each have strengths and weaknesses over different word classes. A particular focus of this paper is the r...
متن کاملUniversal Grammar and Chaos/Complexity Theory: Where Do They Meet And Where Do They Cross?
Abstract The present study begins by sketching "Chaos/Complexity Theory" (C/CT) and its applica-tion to the nature of language and language acquisition. Then, the theory of "Universal Grammar" (UG) is explicated with an eye to C/CT. Firstly, it is revealed that CCT may or may not be allied with a theory of language acquisition that takes UG as the initial state of language acquisition for ...
متن کامل